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A Derivation of the Sample Multiple Correlation 
Formula for Standard Scores 



Francis J. O'Brien, Jr., Ph.D. 



This IS the third paper in a series of publications that is 
designed to supplement the statistics training of students (see O'Brien, 
1982a, 1982b). The intended audience is social science students studying 
applied statistics. 

What I am attempting to do in this series is present selected 
proofs and derivations of important relationships or formulas that students 
do not find available and/or comprehensible in journals, textbooks and 
so forth. The unique feature of these papers is detailed step by step 
proofs or derivations. Calculus is not assumed nor is it used. Each 
proof or derivation is presented algebraically in great detail. 



Many students have learned that the multiple correlation between 
a criterion (or dependent variable) and a finite number of predictors (or 
mdepende-^t variables) can be expressed as a weighted sum of regression 



relationship holds only for variables that are in standard score (z) form. 
This multiple correlation formula for p predictors (i.e., any number) can 
be written: 



Introduction to Proof 



weights and criterion/predictor product moment correlations- This 




P 




+ B^r 



2 y2 



+ . . .+ B r 



D y3 



+ . . .+ B r 



p yp 



Writing the right hand side in summation notation: 




where 
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2. 



Z • Z ,Z , . . . ,Z , . . . ,Z = multiple correlation of standardized variables 
y 1 2 3 p 

Z = the criterion expressed in standard score form 

y 

Z, ,Z , . . . ,Z. , . . . ,Z = standardized predictors 

12 3 p 

B ,B , . . . ,B , . . . ,B = beta (regression) weights attached to each 



1' 2 3 P 



standardized predictor 



r .r ....,r ,...r = product moment criterion/predictor correlations, 
yi y2 yj yp 

The above multiple correlation formula is presented m many 
applied statistics textbooks. It will be derived in this paper. We will 
begin by deriving the relationship for the simplest multivariate case: 
one criterion and two predictors. 

It is always helpful to have a plan in a proof or derivation. 
The general plan we will use can be summarized in the following steps: 

1. state the regression model 

2. derive the normal equations (See the Appendix) 

3. define the multiple correlation 

4. substitute the normal equations into the multiple correlation 

5. simplify. 

Some of these steps will be refined to suit a particular application. 



Derivation for Two Predictors 



Let us review briefly some of concepts, notation and 
logic in regression analysis. We will begin with regression analysis 
for two predictors in raw score form. 

The mathematical function used to obtain the best linear fit 
for two raw score predictors is: 

? = a + b^X^ + b^X^ I where 

? = the predicted criterion 

a,b^ and b^ = constants to be derived through the least squares 
procedure 

X, ,X = predictor variables 
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3. 



The notation in the above mode, js stated in abbreviated fotxa. 
We have done this to minimize uhe reading of the symbolism and to 
clarify the concepts in the development of the derivation. 

The regression model stated on page 2 is an idealized model. If 
a data set consisting of one criterion and two predictors can be assumed 
to be linear, then the model is a reasonable one to apply for prediction 
of actual or observed sample scores. It is idealized in the sense that 
no error tenm is included in the model. That is, when an actual or 
observed criterion score is compared to the criterion score predicted 

by the idealized model, some error is likely to occur the "fit" is 

less than perfect. If we call the actual sample score criterion Y, we 
can express the observed raw score model as follows: 

Y = ? + e , where 

e = amount of numerical error resulting from using the idealized 
model (?) to predict the actual score (Y) . 

The goal in regression analysis is prediction of all individual 
criterion scores in a distribution with the smallest possible error. The 
error made in predicting observed criterion scores by the idealized model is: 

Y - ? = e 

This IS the quantity that we want to be as small as possible. The 

procedure most often used in the social sciences to accomplish this 

IS the least squares procedure. The least squares criterion or goal can 

be expressed as: 

n n 

X 2 r-*2 . . 1 

(Y^- = " a minimum 

1-1 1=1 
If we substitute the quantity for ? we can write: 

^fY - (a + b^X^ + B^X^) ] = ][(Y - a - b^X^ - b^X^) = Z!? = minimum 
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^If It is understood that the all summations range 
fran i=l to i=ti, then we can drcp the sumation limits all together; n, of ocxirse, 
refers to the sample size of n sample cases regardless of the number of 
variables in the regression model - later v*ien the algebra beocmes more ootplesc v« 
use surmticn limits. 



4. 



Regression Model for Two Standardized Preaictors 

If we now convert the variables of the two-predictor raw score 
model to standard score form, we can write the two predictor regression 
model as: 



- h ] , , /'2 - ^21 . Where 
Z» = A + B, + B. 



S« ? MS / 2 I S 

? \ J \ ^2 

= the predicted criterion in standard score 

A, B . B = the constants to be derived through least squares 
1 2 

X^, X ~ sample means 

3 S = sample standard deviations 
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We can write the regression model as: 



A . B^Z^ . B^Z^ _ ^^^^^ 



Z ,Z = standard scores of the predictors. 
1 2 



The least squares criterion for the standard score regression model 
can be written as the difference between actual criterion and predicted criterion: 



2 2 

e = a minimum 
Z 



Substit ng for Z 

^Z^-A - B^Z^ -B^Z^ )2 = 53 



2^2 

e^ = a minimum 



The least squares criterion is satisfied mathematically through 
calculus (partial derivatives) . One by-product of the calculus technique is the so- 
called "normal equations". In the Appendix of this paper the reader will find a de- 
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scription of procedures given which will allow the reader to find 
normal equations for the two predictor model as well as models containing 
any number of predictors. The reader may find it helpful to study the 
Appendix at this point. 

For the two standardized predictor model, the normal equations may 

be written as follows: 





nA + 


^2^2 






^2pl^2 


rv2 = 


A^^ . ^IV2^ 


^2P2 



The normal equations can be simplified if the following facts about 
standard scores are recalled (see O'Brien, 1982b): 



■ I^v ■ Pi ■ E 



h - ° 



2 



n-1 
n~l 

•"rV2_ ^ Pl^2 = ("-l"^i2 

n-1 



If these substitutions are made into the normal equations, we obtain: 



0 = nA + 0 +0 

(n-Dr^^ = 0 + B^(n-l) + ^2^'^~^^^12 

(n-l)ry2 - 0 ^^^"^^^12 ^ ^2^"^"^^ 



6. 



Consequently, A = 0 and may be ignored in subsequent results. If we 
divide through the last two normal equations by (n-1) we obtain a 
final statement of the normal equations: 

For the readers convenience we will restate these normal equations 
prior to the derivation. 

Multiple Correlation for Two Standardized Predictors 

By definition, the multiple correlation for two standardized 
predictors is: 

R ^ = corr(Z .Z.) = corr(Z ,B Z + B Z ) 

12 ^ ^ Y 1 1 2 2 

cov(Z^.Z^) 



^ var (Z^)var (Z^) 

cov(Z . B + B_Z ) ..H^K^ 
Y 11 2 2 # where 

J var(Z^)var(B^Z^ + B^Z^) 

corr means correlation; cov means covariance, and var means variance. 

It IS important to remember that B^ and B^ are constants. If we perform 

covariance and variance operations on the above correlation formula, we obtain: 
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7. 



cov(Z^,B.Z. ) + cov(Z ,B Z ) 



Z -Z^.Z 
y 1 2 



^ var(B^Z^) + varlB^Z^) + 2cov(B^Z^, B^Z^) 



NOW, the|^ar(Zy) is equal to 1 because Z^ is a distribution of observed 
standardized criterion scores in a sample; that is, 

J7aTIzT = ^ar[ (Y-Y)/Sy] = Jvar(Y)/^2 = y^T = ^• 

Thenar (Z^) , however, is the variance of predi cted scores tha t comprise 
the regression plane of two variables ; i . e . ,Jvar (B^Z^ + ^2^2^' 

That is, Jvar(z^) is the variance of the equation used in score prediction. 

If we now apply rules of covariance and variance for standard 
score variables and constants, we can simplify the above correlation 
formula : 

B cov(Z^,Z ) + B cov(Z ,Z ) 

R 1 Y 1 2 ^ 

^•^l'^2 ^ \J . . 



^Bjvar(Z^) + B^vartZ^) + 2B^B2COv ( Z^.Z^O 



Further simplification can b.^ achieved if it is recalled that: 

1. the covariance of Z^ and Z^ is: cov(Z^,Z^) = ^ZyZ^^Z^^Z^ ' 

but r = r^, and = S =1- (See O'Brien, 1982b); 

Vl S \ 

similar reasoning can be applied to cov(Z^.z ) and cov(Z^,Z2) 



Therefore, cov(Z^,Z^) = r^^ 
cov(Zy,Z2) = 
cov(Z^,Z2) = r^2 



2. var(Z^) = vartZ^) = 1 



Making these substitutions. 
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Br + 



2 y2 



Derivation 

We are now able to show the derivation of the following multiple 
correlation formula which appears in many applied statistics textbooks 
without proof: 

For the readers convenience, we will restate the normal equati:>ns and 
the multiple correlation fonmila presented earlier. See Table 1. 

The derivation consists of a) substituting the normal equations 
into the numerator of the multiple correlation formula and ^) simplifyi 
algebraically . 

See the page following Table 1. 
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Table 1 



Normal Equations and Multiple Correlation For.Tiula for Two 
Standardized Predictors 



Normal Equations 



Br 



y2 



1 12 



+ Br 

2 12 



+ B^ 



Multiple Correlation Formula 

Br + B r 
Y 1' 2 



* ^2 * ^Wl2 



Note: Proof that \.z^,z^ = >| ^'^yl ^ «2^2 

requires substituting the normal equations into 
the numerator of the multiple correlation formula 
and simplifying. See text for det'^j.ls. 



10. 



If we substitute the normal equations for r^^ and r^^ into the 
numerator of the multiple correlation formula we obtain (see Table 1) 



n| + +2B^B2r 



12 



«1 ^ «2 ' 'Wl2 



^1^4 ^ 'Wl2 



2 

^' ^ ^2 + 2B B r 

^ 1 2 12 



(Thus. B^r^^ * B^r^^ = ^ B^ ^ ^B^B^r^^' 



^2 ^ '^^2'^12 



^"Vyl ^2'^y2 END OF PROOF ^ 



^Recall from algebra that for any algebraic term: 
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Derivation for Three Predictors 

Prior to the derivation for p predictors (the general case), let 
us consider the case for 3 predictors. ThxS will allow us to review 
the logic of the dentation. In addition, we will introduce the use of 
summation notation which is necessary to do for the p predictor derivation. 

The first step is to state the regression model for three standardized 
predictors : 



= A . B^Z^ . B^Z^ > B3Z3 



The least squares criterion for this model is: 

Y^^Y " " r^^y ' ^ ~ ^1^1 ^ ^2^2 ~ V3 1^ = ^ minimum 

The next step is to derive the normal equations. As outlined in the 
Appendix, the normal equations for 3 predictors ;in simplified form) are 

r = B r B r^^ ^- B, 

y3 1 13 2 23 3 

For the readers convenience we will restate the normal equations prior 
to the derivation. 

The third step is to define the multiple correlation between 
and the three predictors. This is done on the following page. 
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corr(Z^,Z^) = corr(Zy,B^Z^ + B^Z^ + B^Z^) 



cov(Z^, Zy) 



var (Z^)var (Z^) 



^ var (B^Z^ + B^Z^ + B^Z^) 



Recall that ^var (Z^) = 1, 



Applying the rules of covariance and variance algebra for standard 
score variables and constants: 



cov(Z^,B^Z^) + covtZ^.B^Z^ ) + covlZ^.B^Z^) 



var(B^Z^) + vartB^Z^) + vartB^Z^) + 2cov (B^Z^,B2Z2) + 
^ 2cov(B^Z^,B3Z3) + 2cov(B2Z2, B^Z^) 



^^1 ' ^2\2 ' "3^3 



B^var(Z^) + B^varlZ^) + B^vartZ^) + 2B^B2COV ( Z^ , Z^ ) ^ 



\2B^B3Cov(Z^,Z3) + 2B2E3COv(^2'^: 



B, r + B, r ^ + B.r . 
1 yl 2 Jf2 3 y3 



For easy reference, the multiple correlation and normal equations 
are restated in Table 2. 
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Table 2 

Normal Equations and Multiple Correlation Formula for Three 
Standardized Predictors 



Normal Equations 



r = B + B^r, ^ + B r, _ 

yl 1 2 12 3 

'y2 = ^^12 ' ^2 ' ^^23 

-y3 = ^^13 ' "2^23 ' "3 



Multiple Correlation Formula 



V'r'2''3 



Note: Proof that = J Vyl^ Vy2^Vy3 

requires substituting the normal equations into the numerator of the 
multiple correlation formula and simplifying. See text for details. 
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The fourth step is to substitute the normal equations into the 
numerator of the multiple correlation formula. Substituting the normal 
equations : 

cov(Z^,Z^) = B^r^^. B^r^^ * ^^3 

= (b', - B^B^r^^ . B^B3r^3) . (B^B^r^^ . b] . BB^r^^) . (B^B3r^3 > B2B3r23. B^) 
If we write each parenthesized term on a separate line, we obtain: 

cov(2^.Z^) = bJ . B^B^r^^ * ^^^3 ' 

^^•^12 ' ^2 ' V3'^23 ' 

^^•^13 * V3'^23 * '3 

Let us pause for a moment and consider how to write out this covariance matrix 
in summation notation. 

It IS clear that the three squared B terms can be written in summation 

notation as: 3 

1 ^ i j = l -> 

The remainder of the terms consists of three pairs of quantities: 

^^^2^12 * '^^^13 ' ^W23 
One common way to write this in summation notation is as follows: 



3 2 

'^Wl2 ' WU ' W23' - ^X'Xb.B^^^ 

The total number of terms can be determined by multiplying the upper limits 

of the summation (3x2 = 6). Also, from the summation limits(i=l,2 d=2,3) 

It IS clear that the first term is B^B2r^2 ^"'^ ^^^^ ^^"""^ ^2^3''23- ^^^^ 

mc is 



15. 



leaves 4 terms to be filled in. Start from ^j^B^r^^ increment the 

summation limits by one — begin with i=l and increment ] until it is exhausted (2,3) 
and then go back to i and increment it to 2, and increment ] until the 
limits are exhausted. It is helpful to have a covariance matrix written out 
such that the term pairs can be "read off". 

Thus^the covariance term of the multiple correlation formula can be 
written in summation notation as follows: 



cov 



3 2 
2 ^ T"b B r 



B,r , + B^r ^ + B r 
1 yl 2 y2 3 y3 



I- 



YD 



Turning to the denominator of the correlation formula, it is readily 
apparent that it is identical to the covariance term above. That is: 



. b] . b] . 2B^B^r^^ . 2B^B3r^3 * 2B,B,r, 



2 3 23 



3 2 



]=1 



+ 2 y T^B B r 

itr ^ J 



D YD 



Hence, the multiple correlation written in summation notation is: 
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3 2 



V^l'^2'^3 



/ bJ + 2^" y~B B r 



\ 



^B^ * 2 ^ Vb B r 



j-1 



]=2 i-1 



13 



16. 



Making the same algf^hraic simplification we made for the two predictor derivation, 
we obtain: 



3 2 



V^l'"2'"3 



^ 1 ~ 1 1 1' 

1=1 ^ 1-2 1=1 ^ - 



3 

This completes the derivation for three predictors. We .now derive the 
case for any number of predictors in the regression model. 
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Derivation for p Predictors 



Regression Model for p Predictors in Standard Score Form 



The derivation for any possible number of predictors (p) will 
be worked out followi>^9 the same steps used in the derivation 

for 2 and 3 predictors. A restatement of these steps for p predictors is; 

1. state the regression model for p predictors 

2. derive the normal equations 

3. define the multiple correlation 

4. substitute the normal equations into the numerator of the 
the multiple correlation formula 

5. express the covariance term in summation notation 

6. express the variance term in summation notation 

7. simplify algebraically 

The linear regression model for p predictors in standard score form is: 

Z = A + BZ + BZ + B,Z^ +...+ B.Z + ... + B Z 

^? 1 1 2 2 3 3 3 3 P P 

(Shortly we will see that the A term is equal to 0 and can be ignored as 

we discovered in the 2 and 3 predictor models) . 

The least squares criterion is: 

YiZ - Z-) ^ / e = a minimum. 
Y Y ^ Z 

substituting for Z^ , the least squares criterion is: 

2 



£,Z^-A-B^Z^- B^Z^ - B3Z3 -...-B^Z, -...-BpZp) 



minimum 
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Finding the normal equations involves the same procedure used for 
the 2 or 3 predictor models. See the Appendix for details. Tne normal 
equations (before simplification) are stated the following page.^ 

^Note that the equations ^'^^^^y^l ' Z^y^2 written with 

the subscripts reversed. Since these sums are symmetric such that 
J^^Z^ "^^2^1 ' ^^4^3 ^Z^3^4 ' general, ^Z^Z^ =^^Z^ , we have 

written these terms such that the first subscript is always less than 
the second subscript. This method of notation helps simplify the algebra. 
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nA + D Z 

1 


+ B T'z 




H 4. A 

D ^ D 


pC P 


I- Y 1 




+ B^Tz z 

2^12 


. 63^2,23 ..... 


B Tz, Z + . . .+ 


B r*z z 

p^ 1 p 


z z 

1— Y 2 






. 83^2^23 


B Yz^Z. +...+ 


B rz_z 

PC- 2 p 


/ Z Z 
A- Y 3 




^ ^21^2^3 


+ B3i;Z3 


B Tz^Z +...+ 




/ Z Z 
1- Y p 


A 7.2 + B, /Z, Z 
*^ P 1 1 p 


' ^2X^2% 


+ B^Vz^Z +...+ 
3 t- 3 p 


B Vz . Z + . . . + 
D *- D P 


B Vz^ 

PL- p 



If we apply the same logic and make the same substitutions as we did previously for the 
two and three predictor models, we obtain a simplified set of normal equations: 



^1 ~- 




+ 


V12 


+ 


^•^13 




. + 


3 I3 


+ . . . + 


P Ip 




^l'^12 


+ 




+ 


^3'^23 




. + 


B r. 
3 2j 


+ . . . + 


P 2p 






+ 


B r 
2 23 


+ 






. + 


3 33 


+ . . . + 


P 3p 


r = 

yp 




+ 


2 2p 


+ 




+ . . 


. + 


B r 

3 3P 


+ . . .+ 


B 

P 



24 



These are the normal equations we want to work with in the derivation. A restatement of them is given 
later. 
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To give the reader a "feel" for the notation in multivariate 
problems, we will work out the normal equations for 5 predictors. 
The least squares criterion is: 





- Vi- 


V2 - 


B Z - B.Z^ - 
3 3 4 4 




a minimum 




Using the procedure outlined in 


the Appendix, 


the normal equations 




are as follows: 












fz 

Y 


nA + 


B^IZ, 




* ^1^3 


* ^2^4 




IVi = 


aIz^ + 


B TZ^ 
^li. 1 


* ^2P1'2 


3 4_ 1 J 






IV2 = 


aTz, + 


^ 

«ll-^1^2 


2 L 2 


* ^3l?2^3 


^4 L 2 4 




Tz z = 


A Tz^ + 


«ll^l^3 


+ B^Vz^Z, 
2 2 3 


* ^3l?3 


+ R Cz Z + 
^4L 3 4 




= 


aIz^ + 




+ B^Tz.Z^ 
2 ««- 2 4 


* «3^'3^4 


i_ D S 7 ^ + 

4 1- 4 










+ B^Vz Z_ 
2 t- 2 5 








Simplifying 


, we get: 
















^2'^12 


' Vl3 * 


Vl4 * 


Vl5 






V12 * 


^2 


* ^3'^2 3 * 


^•^24 * 


^•^25 




^3 = 


Vl3 * 


^2'^23 




^•^34 * 


^•^35 






^^4 * 


^2'^24 


' ^"^34 * 




^•^45 






^•^15 * 


^2'^25 


^ ^ "35 * 


^•^45 * 







^The reader who has studied the Appendix may wish to attempt writing out 
the normal equations in advance of seeing the results. 
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Mu 1 tiple Correlation for p Standardized Predictors and Derivation 

We are now ready to derive the multiple correlation for p predictors. 
See Table 3 for a statement of the multiple correlation formula. 
TV^G covariance term is: 

COV(Z,^, B^Z^, B^Z^, B3Z3 B.Z^ BpZp) 

= Br +Br +Br +...+ B.r +...^-Br 

1 yl 2 y2 3 y3 ] yj P yp 

Substituting the normal equations (see Table 3) , 

cov(Z^,Z^) = B^(B^ . B^r^^ * Vlp' ' ^2'^'^12 ' ^2 ' ' ' V 

In order to express this in summation notation, let us write out the full 

cov(Z ,Z^). Multiply each B_. term inside the parentheses and sta e each 

parenthesized term on a separate line, giving: 

cov(Z .Z^) = B r + B r + B,r , +...+ B r +...+B r = 
covi^y,^^) °1 yl 2 y2 3 y3 j yj p yp 



' W12 ' Wu'---' V/13 'iVip 

V2'^12 ' 4 ' ^2^'^2j '2V2P 

V3'^13* W23*^3 ^^•^33 B3Bpr3p 



2 

B,B r, + B.B r. + B,B r, +...+ B B r +...+ B 

1 p Ip 2 p 2p 3 p 3p 3 P DP P 
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Nonoal Equations and Multiple Correlation Formula for p Standardized Predictors 



Normal Equations 



'y2 



y3 



+Br + Br +...+B.r, .+...+Br 

2 12 3 13 D 1] P IP 



Br + B 
1 12 2 



+ Br +...+ B r^ .+...+ B r^ 

3 23 ] 2] p 2p 



Vl3*V23 ' ^ 



+. . .+ B r, . +. . .+ B r, 

] 3] p 3p 



yp 



Br +Br +Br +...+ B.r. +...+ B 
I'^lp 1 2p 3 3p D DP P 



Multiple Correla .ion Formula 

\' ^l'^2'^3 % 



corr(Z^,Z^) = corr(Zy,B^Z^, B^Z^, B^Z^ B .Z B^Z^ ) 



cov(Z^,Z^) 



^ var (Zy) v'ar (Z^) 



COV(Z^. B^Z^ . B^Z^ ^ B3Z3 ..... B.Z. ..... BpZp) 



2 



var 



(B Z . B Z . B Z ..... B Z. ..... B Z ) 



1 1 2 2 3 3 



P P 



Br .Br .B,r,.... B.r...... Br 

1 yl 2 y2 3 y3 P VP 



13^ . B^ . B3 . . . .. 



«p * 2B^Vl2 'Wl3 ' 2Wl4 '^^'^ij'•••' ' Vl^'«^''P 
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Note: proof requires substituting the normal equations .nto the numerator of the multiple correlation 
formula and simpli^ mq. see text for details. 
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We want to express this sum in summation notation to simplify the 

algebra. First, count the number of terms to be summed. It is evident 

that each row of the covariance matrix consists of p terms. Also, there are 

2 

a total of p rows. Hence, there are a total of (p) (p) = p terms in the 

2 

entire matrix. It is also evident that each row contain one term or 
2 

a total of p terms in the entire matrix (along the northwest to southeast 

diagonal of the matrix) . How many other terms are in the matrix (off diagonal 

terms) can be answered with a little algebra. Let X represent the number 

of off diagonal (B.B.r..) terms. Then: 
1 D ID 

2 

p = p X or 
2 

X = P - p 
X = p(p-l) 

2 

Thus, there are p B. terms and p(p-l) B B r terms in the entire matrix. 

D 1 D ID 

Another view is as follows. The diagonal (B^ terms) consists of 

p terms. The remainder of the off diagonal terms consists of a number of 

identical pairs of terms. If we halve the matrix and visualize the upper 

2 

half only, then we are thinking of p B . terms plus one-half of the B B^r 

^ D ^DiD 

terms. That is, because of the symmetry of the off diagonal terms in 

the matrix, the off diagonal consists of p(p-l) terms. The total number of 

2 

terms in the half matrix is: p p(p-l) . To represent the total number 

2 

of terms in the entire matrix, simply double the number of off diagonal terms 

2 

in the half matrix: p = p + 2[p(p-l)/2] = p + p(p-l) . Examine the matrix of 
5 predictors on page 20 for clarification. 

2 2 
As explained, the cov(Z^,Z^) consists ot p terms. There are p B^ 



and 
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p(p-l) B^B^r^^ ( or 2 Ip(P^^)/2]) terms. Consequently, we can write 



23 



24. 



the covariance of the multiple correlation formula as: 



P P P^l 

cov(Z ,Z^) = Z_B^ + 2 / / B B r 

3=1 3=2 1-1 



This IS simply a generalization of the 2 and 3 predictor models. The first 
term of the double summation will be ^^B^r^^ ^^^^ ^^^^ ^^^^ 

B Br (see Table 3). For example, in a 5 predictor model, there are 

pr-1 p p-l,p 

5^ « 25 terms; 5 are b2 terms and 'j(5-1) = 20 are B^B_^r^_^ terms (or 10 pairs 
of terms [2p(p-l)/2 = 2 X 5(4)/2 = 2 X 10 = 20 ]). The first term will 

be ^^2^12 ^^^^ ^'^^^ ^4^5^45' 

Therefore we can express the covariance term in the p predictor 
model as? 

cov(Z^.Z^) = B^r^^ . B^r^^ . B3r^3 ..... B^r^. ..... B^r^^ 



P 

B. 
D YD 



Equivalently , 



P P p-1 

cov(z^.z^) = IIb^ . le^B.r^^ 
3=1 3=2 1=1 



Thus, 



cov(Zy,Z^) 



i=l 



r . 
YD 



B^ . 



P p-1 

2 A. 2- B B r . 
t^2 1 = 1 ^ J 
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Turning now to the variance term of the correlation formula, 
we can apply rules of variance and covariance algebra to the B^Z^ terms 

of (see Table 3). Again, the results of these manipulations are 

simply generalizations of the 2 and 3 predictor models. From an inspection 

of Table 3, it should be apparent that we can express the variance term as follows: 



^var(Z^) 



var(B^Z^ + B^Z^ + B3Z3 B^Z. +...BpZp) 



+ B^ + B^ +. . .+ B^ +. . .+ B^ + 

12 3 D p 



.,+ 2B ,B r , 

p-1 p p-l,p 



T^vs, m suTiTBtion notation: 



4 



var(Z^) = 



P p-1 



V" B^ + 2^" y B B r 



Therefore, the multiple correlation for p predictors is: 

p P P-1 



5^ B^ + 2} > B.B.r. 



z^,Z2,z^, . . . ,z^, . . . ,Zp 



p p^l 



V'b + V B.B.r. 

>^ ^1^ j^lbl^'^' 



P P-l 



/ B^ + 2 V > B B r . , 
y j=l ^ D=2 1=1 



\ 



r^B r 



END OF PROOF 
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APPENDIX 
Finding Normal Equations 



Introduction 

This appendix will outline in detail the procedures for finding 
the normal equations in regression analysis. The procedures described 
are appropriate when: 

a) the regression model is linear and 

b) the variables are standardized. 

An example of a nonlinear model is: 

The model is not linear because one of the variables (Z^) is raised to 
the second power (Z^ ) . The procedures described in this appendix would 
not be applicable for such a model. 

Just what is a normal equation? This is a question often asked by 
students. One way to answer this question is to say that a normal equation 
is one of the equations that results from a calculus technique called 
partial differentiation applied to a regression model to satisfy 
a criterion of minimization. For example, a regression model of two 
predictors contains three constants (A, B^, B^) which must be solved in ^ 
order to minimize the function X^Zy-Z^)^ " X^^Y ~ ~ °1^1 ~ ^2^2^ ^®Z 

using the least squares criterion (smallest or minimum error when the idealized 

model IS used for prediction of actual or observed criterion scores, Z^) . In 

this example, the calculus procedure is applied to each of the three 

terms individually. When the procedure is apflied for one of the terms, 

and the result is solved algebraically in terms of the criterion variable, the 

result is termed the "normal equation" for that term. 
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In this paper we are not interested in solving for the terms of 
the model per se. Rather we are interested in using the normal equations 
to derive the multiple correlation formula for standardized scores. The 
normal equations allow one to accomplish this. In fact, we are following 
the same steps that would be used in actually solving for the terms that 
satisfy the least squares criterion up to the point when the normal 
equations are derived for a regression model. Since this paper does not 
assume a knowledge of calculus, a heuristic procedure is given for 
finding normal equations. Students who are familar with calculus 
can read any text of mathematical statistics for technical details. 

Plan 

The plan for finding normal equations is outlined in four phases 
as follows: 

A. state the regression model, 

B. state the mathematical function of the least squares 
criterion, ^(Z^ - Z^)^ 

C. derive the normal equations for each of the terms and simplify 
D summarize the normal equations 

Finding Normal Equations for the Two Predictor Model 

We will demonstrate the four phase procedure first for the 2 predictor 



model . 



A. The regression model for 2 predictors: 

= A * B^Z^ * B^Z^ 

B. The mathematical function to be minimized according to the 
least squares criterion is: 

C. The procedures for deriving the normal equations for the constants 
are as follows: 
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1. The procedure for finding the normal equation for A is summarized 
in 5 steps: 

1. drop exponent and set function in phase B equal to 0 . 

2. distribute the summation operator 

3. apply the rules of summation for constants 

4. solve in terms of the criterion variable, 

5. Apply the rUlefe fOrstandard scores and simplify.^ 
Applying each of the steps in turn produces: 



1. 




A - B^Z^ 


- B^Z^ ) = 0 


2. 






- IB^Z^ - ^B^Z^ = 0 


3. 


IS 


- nA 




4. 




^ nA 




5. 


0 


= nA 


+ B^O + B^O 



Solving for A in step 5 shows that A = 0. As a general rule, A 
will always be 0 when the regression function is linear and stated in 
standard score form. 



1 

For students who need to review standard scores, see O'Brien, 1982b. Also, 
see page 5 of the present paper for examples of rules. 
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2. The procedure for finding the normal equation for can be summarized 
in 7 steps: 

1. drop the exponent 2 and set function in phase B equal to 0 

2. multiply the function by 

3. distribute the term 

4. distribute the summation operator 

5. apply rules of summation for constants 

6. solve in terms of the criterion variable, 

7. apply rules for standard scores and simplify. 

Note that we do not try to solve for B^ in this procedure. We 
are applying a procedure to find the normal equation for B^. Applying the above 
7 steps: 



2. 
3. 
4. 





- - W =° 






rvi - 




iVi - 








I ^\ = 






= 0 + B^(n-l) + B^Cn-Dr^^ 



Dividing through by n-1 , we obtain a simplified statement of the 
normal equation for B^: 

Q r « B, + Bo^i^ 
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3. The steps for finding the normal equation for parallel those 
for B^: 

1. drop the exponent 2 and set function in phase B equal to 0 

2. multiply the function by 

3. distribute the Z^ term 

4. distribute the summation operator 

5. apply rules of summation for constants 

6. solve in terms of the criterion variable 

7. apply rules for standard scores and simplify. 

To Iterate, we are not solving for B^. Applying the 7 rules: 



1. ^(Z^ - A - B^Z^ - B^Z^) = 0 



2 



. y-(z^ - A - B^Z^ - ^^Z^)Z^ = 0 



3. ^(Z^Z^ - AZ^ - B^Z^Z^ - B^Z^) = 0 

IV2 -P"2 - IVl^2 -IV2 =° 

5. IZ^Z^ - A^Z^ - B^^Z^Z^ - 

6. \ZZ^ = aYz^ * B^^Z^Z^-^ B^IZ^ 

7. K^Dn^^ =0 + ^l'"~^''^12 * ^^(n-l) 
Dividing through by (n-1) , 



'^y2 = ^^^12 * ^2 
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We now write out a full statement of the normal equations to 
summarize the results. As noted, a normal equatiok. is consisdered 
derived at the point when we solve in terms of the criterion variable. 
Subsequent steps are employed to simplify the result. The 
normal equations for A, and were: 

for A: Y.Zy - "A + B^^^ + B^^Z^ 

When we applied rules for standard scores and simplified, we 
obtained the following set of normal equations used in the 
derivation for two predictors (recall A=0) : 

r = B + B r 

yl 1 2 12 

r = Br + B^ 
y2 1 12 2 



Finding Normal Equations for p Predictors 

The rules for finding normal equations for the two predictor 
model can be generalized readily for models with any number (p) of predictors. 
We will show two methods for the general case. First we will demonstrate 
the procedure using the four phase plan. Then we show a much simpler method. 
But the shorter method depends on first showing the longer one. 
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Applying the four phase plan gives the following results: 

A. The regression model for p predictors is: 

« A * Vi * ^ V3 ^^ Vp 

B. The function to be minimized according to the least squares 
criterion is: 



2 

Y " "l"l "2"2 "3"3 "j'j "P"P 



Y|(Z„ - A - B,Z, - B^Z^ - B,Z^ B^Z. . B^Z^) 



C. The procedures for finding normal equations for A and any B^ term are: 

1. In deriving the normal equation for A, the result is always the 
same" A * 0. 

2. Finding the normal equation for any B^ term can be done 
in 7 steps: 

1. drop the exponent 2 and set function in phase B equal to 0 

2. multiply the function by Z^ 

3. distribute the Z^ term 

4. distribute the summation operator 

5. apply rules of summation for constants 

6. solve in terms of the criterion variable, Z^ 

7. apply rules for standard scores and simplify. 
Applying these steps in turn produces : 

1. '^(Z^-A - B^Z^ - B^Z^ - B3Z3-...-B^Z^-...- B^Z^) =0 

2. pz^-A - B^Z^ - B^Z^ - B3Z3-...-B^Z^-...- B^Z^) Z . = 0 

3. I(2,Z^-AZ. - B^Z^Z. - B^Z^Z . - B3Z3Z B^Z^ -. . B^Z .Z^, =0 



ERIC 38 



33- 



5. Tz z, - aTz. ^ B,Tz,z. + bTz^z> bTz^z.-^. . .-^ bTz >...+ bTz.z 

» 0 

6. Tz z « aTz. + B,Vz,z ^ B Yz z bVz^z B.Vz^-^--.+ bTz.z 



7. (n-l)r . = 0 + B, (n-Dr, ^ B^(n-l)r^ ^ B^(n-l)r^ +...4 B . (rwl) + 
y] i 1] 2 2] 3 3] 3 



B (n-l)r. 
P DP 



Dividing through by (n-1) , 



r . = Br, 4 B^r^ + B^r^ . -t-...-*- B. B r. 

y] 1 1] 2 2] 3 3] ] p ]p 



Alternate Procedure 



The above normal equation (r .) is a general result for any B term. 

y] 3 

Now a much simpler method for finding a full set of normal equations will be 
discussed. 

Recall that the correlation of a variable with itself is equal to 

1. That IS, r, - = r^^ = r^^=...=r = 1 . From this fact and the general 
11 22 33 pp 

normal equation, one can write out the entire net of normal equations for any 

number of predictors. If r . holds for any B term, then it holds for 

y] ^ ] 

3=1, j«2, ]=3, . . . , j=p. For example, assume p=2 predictors. We know that 

2 

the set of normal equations will consist of (p)(p)= 2 terms. Thus, f'^rst 
first write out r . twice as follows: 

y] 

• * B r, . 4 B^r^ . 

y3 1 1] 2 23 

r . » B, r, . B^r^ . 

y] 11] 2 23 

Now substitute the appropriate j value: j»l throughout for line 1, and 3=2 
throughout for line 2: 
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or 

r = B r + B r 
y2 1 12 2 22 

or 

The third set show? the subscripts in reverse order for the B term correlatioi 

r « r . Although not necessary# it may be easier to do this in 
21 12 

order to conform to this convention as we have done in the derivations. 

In summary, finding normal equations for any number of predictors 
involves the «ated application of several steps for each of the B^ terms. 

Example for Five Predictors 

To exemplify the procedure for p predictors, we will work through 
the solution of normal equations for five predictors. We will first 
do It by the long method, and then show a solution by the shorter method. 

A. The regression model is: 

Z. = A . B^Z, B^Z^ ^ B3Z3 . B^Z^ . B^Z^ 

B. The function to be minimized is: 

C. The normal equation for A ==0. To derive the normal equations 
for B^ through B^ apply the 7 steps listed on page 32 used 
for finding the normal equation for any B^ term. The amount 
of algebra involved in doing this requires an efficient proced 
One method that may be found useful is as follows: 

1. write step 1 as ~ ^? ^ ^ 



r = B r + B 
y2 1 12 2 



■^yl = \ * V12 
'^y2 = ^-^12 ' "2 
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2. write tne second step for each of the terms as; 
I(Zy - z^)z^ ' 0 

I(Zy- Z^)Z3 » 0 

3. write step 3 for each of the terms as: 





- ^1^?' 


«0 




- z^z^) 


=0 


BV3 


- Z3Z,) 


«0 


I(Z,Z4 


- w 


=0 


I<^yS 


- z^z^) 


=0 


Steps 


^ ^ and 


6 







V2 * * V4 * W^l 






= I(B^Z^ . 


V2 * ^^3 * * ^'5''2 






= 1' 


B Z + 
11 


^2^2 * ^^3 * V4 * ^5^'^ 






' I(B^Z^ * 


^2^2 * * V4 * ^'5'^ 






= I( 


B Z + 
11 


^2^2 ' ^^3 ' V4 * ^S'S 




distribute 


the Z. 
] 


term and the summation operator : 




I^Y^l 




2 


*^2l'l'2 * ^I'l'3 * ^I^l'4 * 


^I?l^ 


r^^2 




^^1^2 


.^tz2 . B3YZ2Z3 . B^^^Z^ . 


^1^2^ 


L Y 3 






* B2IZ2Z3 . B3IZ2 . B^^Z3Z^ . 


^31^3^ 






M?1^4 


- B^IZ^Z^ > B3IZ3Z4 > B^^zJ . 


^^4^5 






B^IZ^Z^ 


*^?2^ * ^31^3^ * ^IVS* 


Bsl^^5 



ERiC 4; 



36. 



Applying rules for standard scores: 

(n-l)r , « B (n-1) + B^(n-l)r.^ + B (n-l)r,^ + B^(n-l)r,^ + B_(n-l)r._ 

yl 1 2 12 3 13 4 14 5 15 

(n-l)r^^ = ^1^^^^^12 * ^2^""^^ * ^3^"~^^^23 ^ ^4^"~^^^24 ^ ^5^"~^^^25 

(n-l)r . = B (n-l)r.. + B^(n-l)r^, + B (n-1) + B (n-l)r,^ + B^(n-l)r^_ 

y3 1 13 2 23 3 4 34 5 35 

(n-l)r ^ = B (n-l)r,^ + B^(n-l)r^^ + B,(n-l)r^^ + B (n-1) + B^(n-l)r^^ 

y4 1 14 2 24 3 34 4 5 45 

(n-l)r ^ = B (n-l)r. _ + B^(n-l)r^^ + B^(n-l)r,^ + B (n-l)r^^ + B^(n-l) 

y5 1 15 2 25 J 35 4 45 5 



Dividing through by (n-1) produces the simplifiei set of normal 
equations: 



r =B +Br +Br +Br +Br 

yl 1 2 12 3 13 4 14 5 15 



r =Br +B +Br +Br +Br 
y2 1 12 2 3 23 4 24 5 25 

r =Br +Br +B +Br +Br 

y3 1 13 2 23 3 4 34 5 35 

r = B,r +Br +Br +B +Br 
y4 1 14 3 24 3 34 4 5 45 



r =Br +Br +Br +Br +B 
y5 1 15 2 25 3^35 4 45 5 



2 

The short method for 5 predictors begins by writing the 5 terms for 
the general r^^ normal equation. See the next page* 
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r =Br -»-Br B^r,. + B . + B^r_ . 

yj 1 Ij 2 2d 3 3j 4 4d 5 5^ 

r «Br -»-Br B,r, B B_r_ 

yj 1 Ij 2 2d 3 3d 4 4d 5 5d 

r =Br -t-Br -t-Br -t-Br.-t- B^r^ 

yD 1 Ij 2 2j 3 3j 4 4d 5 5d 

r =Br +Br -t-Br +Br.-»-Br^. 
yi 1 Ij 2 2j 3 3j 4 4D 5 5d 



1 Id 2 2d 3 3d 4 4d 5 5d 



Substitute the appropriate j term (j=l for line 1, j=2 for line 2, etc.) 
and set ^^j^j^'^* ^22^^' 

'y2 = ^-^12 ' ^2 ' ^3^32 ' \' ^2 * ^5^52 
'y3 = ^-^13 ' V23 ' ^ ' ^-^43 ' V53 
'yA = ^-^14 ' ^-^24 ' ^-^34 ' ^ * ^"^54 

Now, if one desire, the subscripts of the predictor correlations can 
be reversed such that the first subscript is less than the second. The 
result IS the same set of normal equations derived under the long 
method . 
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NOTE 

Three related papers that may be of interest to the reader are 
in preparation for publication in the ERIC system. Tentative titles 
and expected order of appearance are as follows: 

!• A derivation of the sample multiple corre^ t ion formula 
for raw scores, 

2. A derivation of the unbiased sample standard error of estimate. 

3. A matrix algebra technique for finding beta weights from a 
a correlation matrix. 
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